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probes and of the number and the length of cloned fragments of genomic DNA in order to achieve a* compiete a 
sequence as possible. Said addition does not change the essence of the method of achieving this or the determination 
of sequences of repiicated fragments of genomic DNA and thus also of the sequences or the entire genomic DNA 
or pans thereof of one or several species by arranging positively hybridized oiiconucleotide probes via overlapping 
sequences and by carrying out the hybridization under conditions tn which these probes hybridize only to fully 
homologous sequences. 

Because this addition includes several changes and clarifications, for greater clarity the addition is written 
in the form of a comp l etely new DESCRIPTION which coexists of 8 (Serbo-Croaaan] pages. In the sew 
description, we did not include the use of plasnud vectors and of ampufication as possible methods for obtaining 
replicated fragments of genomic DNA. Competitive hybridization is the only substantially new part of the entire 
procedure. Moreover, the PATENT CLAIMS have been modified and now consist of 5 claims, and the 
ABSTRACT is new. 

Applicant: 
(Signed: 1 

Prof. Dr. Vladimir Glisin 
Director 



AFF001975 



3 



DESCRIPTION 

Compared to pnor-an sequencing methods, our procedure ts based on an entirely different ioeic and is 
applicable only to the determination of sequences or complex DNA fragments anoVor molecules (more than one 
million base pairs*. It is based on highly specific hybridization of oligonucleotide probes fONPs) with a leneth of 
1 1 to 20 nucleotides. 

Conditions for ONP hybridization have been found which differentiate between complete homology with 
a target and nonhomology in a single base (Wallace. R.B., et al.. Nucieis Acids Res. 6, 3543-3557 (1979)]. When 
the hybridization method with 3 M letramethylamrnoiuum chloride is used, the melting point of the hybrid depends 
only on the ONP length, regardless of the GC composition (Wood. W.I. et al. Proc. Nad. Acad. Sci. USA 82. 
1585-1588 fI985)l. Hence, by hybridizauon under these conditions, sequences are determined unambiguously; By 
hybridization of genomic DNA. replicated in subclones (SQ of appropriate length, with a sufficient number of 
ONPs and by computerized arrangement of the d e t ected sequences, it is possible to sequence the enure genome ax 
the same time. We believe that this procedure is by an order of magninirlr faster and less expensive than the one 
now being developed and that for this reason it is applicable to the sequencing of genomes of ail characteristic 
species. 

For this procedure, it is necessary to optimize the sequence length, the number of ONPs, the number of 
SCs and the length of the pooled DNA that can represent a hybridization spot. 

I i-mcne ONPs are the shortest ONPs that can currendy be successfully hybridized. This would a priori 
mean that 4", or 4194304, ONPs are necessary to detect each sequence. The same number of moependent 
hybridizations would be required for each SC or SC pool (group). Posiuvely hybridizing ONPs would arrange 
themselves over overlapping 10-mers. In this manner, the DNA sequence of the given SC would be obtained. 

The process of SC sequence arrangement is interrupted when the overlapping 10-roer is repeated in the 
given SC. In this manner, uninterrupted sequences are obtained only between repeated 10-mers or longer 
oligonucleotide sequences (ONSs). These SC sequence fragments (SFs) cannot always be arranged into an 
unambiguous linear array without additional information. For this reason, it is important to determine the probable 
number of SFs (Nsf) for a given DNA length by use of probability calculations. 
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ONSs arc distributed in a raniiomJy formed long DNA sequence according to a binomiai distribution. The 
average distance (A) between identical neighboring ONSs depends oniy on the ONS length (L) and is obtained from 
(he expression A = *l L . The probability that the ONS will be repeated N times in a fragment having a lenetb of 
Lf base pairs tbp) is given by the equation 

PfNXf) = CfN.LH x <1/A)*x (I - 1/A) U <1) 

wherein CfN.LO is the number of combinations of class N of Lf elements. The expected number of different ONSs 
of length L or the average distances A that repeal themselves N times in the fragment Lf bp is LMven by the product 
P(N.Lf) x A. if the arrangement of sequences is done via an overlapping ONS of length L,or averaec distance A^ 
then Nsf in the fragment Lf is given by 

Nsf « I + A 0 x INi P(N.Lfl, N * 2 (2) 

In the event that all (4x10*) I i-raers are used, about 3 SFs are expected in a 1.5 kb-long Lf. We <h«U 
return to the problem of SF arrangement later. 

The number of 4x 10* syntheses to obtain all li-mertc ONPs is uneconomical for the practical use of 
sequencing by hybridization (SBH). Deleting a significant number of ONPs (more than 20%) is not advantageous 
because it leads to unread gaps in the s eq u ence. A much better method for reducing the number of '^^^u^ 
ONP syntheses and independent hybridizations is by use of arranged ONP groups. In this case, shorter fragments 
must be sequenced, but there are no gaps in the s equence . The number of syntheses and hybridizations is reduced 
40-fold, but 7 times more SCs are needed. 

From the standpoint of information, the use of arranged ONP groups is the same as the use of shorter 
ONPs. For example, there are 65.536 different 8-rneric ONSs. Since according to our current knowledge an ONS 
8-mer cannot form a stable hybrid, a group of ll-mers can be used as an equivalent. Common to all 1 1-mers in 
the group is one 8-mer. so that information is obtained oniy about its presence or absence in the target DNA. The 
anticipated groups of 1 1-mers each contain 64 ONPs of the type (N2)N8(N1) (the 5\3 # orientation is in the writing 
direction. (Nx) denotes ^unspecified bases and Ny denotes y specific bases). With about 65.000 such groups, all 
sequences are detected. Based on Eq. 2, we find that an average of 3 SFs is expected in 200 bp-long DNA 
fragments. Because of variability, some fragments of this length will have 10 or more SFs. 

ONPs of type ( N2)N6(N 1 ) are not very suitable for sequencing mammalian DNA because of the nonrandom 
GC and dinucleotide composition of this DNA. The common sequence of the ONP group must be 1 nger if it 
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contains more AT bases. Tiikini: this into consideration, it is advantageous to use three kinds of probes: (NM)NIO 
where N 1 0 are ail the 10-mers not con tan in 11 G and C. (Nl)N9(N I ) uhere N9 are ail the 9-mers with I or 2 C^-G 
and (N2)N8(NM where N8 are ail the fi-mers containing 3 or more C-rG. About 81.000 such ONP croups are 
needed. Their average A.,(Ai>) is about 30.000. For the same A*, vaiue in random DNA. about 130.000 ONPs ot 
type <N2)N8f A or T> and tN2)N8(or rG> are required. The given ONP croups permit the sequencing ot 300-bp 
fragments wuh in average of 3 SFs. As a result of this 25 % increase in the number of syntheses, the number of 
required SCs is reduced several times (to be discussed later). 

In addition to these probes lor solving problems or monoionic sequences, for confirmauon of insert 
terminals and for supplementing information thai is lost because of the impossibility of using ONPs that hybridize 
wuh the vector DNA. it is necessary to synthesize an additional 20.000 ONPs. 

Monotomc sequences or. in- general, one or two bp-long ONSs repeated in tandem < AAAAAAA.... 
TCTCTCTCTCT. . . . TGATG ATG ATG ATG A. . . ) represent a problem in SBH. With the above probes, u is not 
possible to determine the length of monotomc sequences that are longer than the common pan of the ONP group. 
For this reason, for accurate determinauoa of the length of monotomc ONSs that are up to 1 8 bp long, the following 
ONPs must be used: 16 An and Tn ONPs where ^denotes 11 to 18 bp, 20 Cn and Gn ONPs where g_denotes9 
to 18 bp. 4 (AT)n ONPs where njias the values (12. 14, 16, 18). 25 (AQn. (AG)n, (TQn, (TG)n and (CG)n 
ONPs where njus the values (10, 12. 14, 16. 18), 60 ONPs of type (NtN2N3)n whkh include ail trmucieotides 
and Ojias the values (12, 15, 18), 180 ONPs of the type (NlN2N3N4)o which include all 4-mers and tubas the 
values ( 12, 16, 18). 408 ONPs which include all 15 bp and 18 bp-long tanrirm 5-mers, 672 ONPs consisting of 18 
bp long tandem 6-mers and 2340 ONPs consisting of 18 bp-long tandem 7-raers. The total number of these ONPs 
is 3725. 



For the confirmation of the ends of DNA inserts in an SC. it is necessary to synthesize an additional 2048 
ONPs of the N6XN5) or (N5)N6 type, where N6 denotes terminal vector sequences and (N5) denotes ail the 5-c 
in both i 



l vector 



The problem of vector DNA can be solved in two ways. One consists of prehybridixation with cold < 
DNA which is 7 bp shorter on both sides of the cloning site. The other method consists of leaving out ONPs that 
are complementary to the vector DNA. Because phage vector M 13 was chosen as the most advantageous one (to 
be discussed later), about 7000 proposed ONPs w,U not be used. This is a significant percentage (1 1 % of about 
65.000 m2)N8(Nl) ONPs). This number can be reduced to about i % by using, in place of the g.ven 7000 ONPs, 
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an additional 21.000 ONPs or the <Nl)N y nN8fNn type, where N8 denotes 7000 M13 8-mers and <N°n deaoies 
each of 3 nucleotides thai is not present next to the aivcn 8-mer. 

Our calculations to date refer to sequencing single-stranded ON A. For the sequencing or double-stranded 
DNA. it is not necessary to synthesize both complementary ONPs. The number or required ONPs is thus halved. 
Because of the convenience of the M 13 system. however, we will stay with the sequencing of single-stranded DNA. 
In this case, the use of one half of the ONPs in the SCs will lead to caps of unread sequences. A gap in a given 
SC, however, will be read in the SC containing the complementary strand. In a representative random SC library, 
each sequence is represented on average 10 times. Hence, the probability exists that each sequence will be cloned 
in both senses, namely that both DNA strands will be read. It is thus possible to use only noncomplementary ONPs 
with an increased use of coroputenzauon. This means that the total number ot required ONPs would be about 
50.000. I i it were possible to construct an M 13 vector that could simultaneously or successively pack both strands, 
the use of noncompiementary ONPs would result in no additional requirements. 

All SCs and/or SC groupings (SC pools) hybridize with all anticipated ONPs. In this manner, for each 
SC or SC pool we obtain a set of positively hybridizing ONPs. These ONPs are arranged in sequences by 
overlapping over the common sequences, which are only one nucleotide shorter than the ONP. For taster detection 
of overlapped ONPs in each synthesized ONP, it is necessary to determine in advance which ONPs show maximum 
overlap with it. Thus, each ONPx will have its subset of ONPs (ONPa, ONPb, ONFc, ONPd) 5' ONPx 3* (ONFe. 
ONPf, ONPg, ONPh). The arrangement is thus achieved by oVfrrfing which of the four ONPs with 5* and which 
of the four ONPs with 3* hybridized positively to the given SC or SC pool. The arranging continues until two 
positive overlapped ONPs are found for the last ONP arranged. When all SFs are extended to a f»««;™«™ this 
computer-assisted process ends. 

By use of the given ONP groups, the number of SFs is increased by the given DNA length. In the general 
case, unambiguous arrangement is possible for a rrumrmim of 3 SFs per SC. counted by the method by which Hsf 
was c al cul a tor! by Eq. 2. Two of these are recognized as the terminal ones and the third is logically in the m\ AAXm 
The arrangement of SFs cannot be resolved by a suitable SC length, because it would be too short. Our solutions 
are: mutual arrangement of SFs and a large number of SCs so that the SC pools, too. can be used as a hybridization 
spot, and/or competitive hybridization of labeled and unlabeled ONPs. 

To obtain as complete sequences as possible by SBH. SCs that are to be used later require three SC 
libraries in the M13 vector with inserts of 0.5 kb and 7 kb as well as different inserts of different sizes made up 
of two sequences which in the genomic DNA are separated by about 100 kb (skipping the SC). The first library 
serves primarily for arranging the SFs. These SCs can also be kept for later experimental use. These SCs 
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participate m the hybridization as pools obtained during phage growth by simultaneous infection or after the growth. 
The second library is the basic one. its SCs with their larger inserts are more suitably stored for further use. The 
7 leb length was chosen as the upper limit for the size of inserts that can successfully be cloned in M 13. The third 
library serves to correctly link into a single sequence parts of se q uences separated by highly homologous sequences 
longer than 7 kb and by uncioned DNA fragments. 

Following hybridization of SCs of all libraries with all ONPs and after SF computing, the mutual 
arrangement of SFs and SCs is undertaken. The basic library is arranged first. The overlapped SCs are detected 
through the content of the entire starting SF of the starting SC or pans thereof. Suitable for mammals are. in the 
general case, all SFs with a length of about 20 bp and longer. The average SF length of these SCs was calculated 
by Eq. 2 and found to be from 2 to 12 bp. This indicates the existence of a sufficient number of SFs of suitable 
length. Moreover, these SFs. of which mostly there are two. are known, and one of them follows the starting SF. 
In this case, both sequences are examined, one of them being the right one*, and the overlapped SCs are detected 
via this sequence. The exact displacement of the overlapping SCs rciaxivc to the starting SC is determined on the 
basis of the remaining SF content. At the same time, by detecting all SCs that overlap with the starting SC the 
SFs of the starting SC are grouped into a linear array of subsets (SSF). The SSFs are defined by neighboring 
endings of overlapped SCs (starMtart. start-end or end-end). The SC overlapping pr o cess ^nrimin via the SF 
taken from the most protruding SSF of the most protruding SC The arranging p roce ss is interrupted when the 
uncioned part of the DNA is encountered or, as in SF formation, when a repeat sequence longer 7 kb is 
encountered. This procedure affords m ax imu m - size groups of arranged, overlapped 7-kb-Joug SCs and linearly 
ordered SSFs of their SFs. 

In this procedure for arranging SFs. the DNA length that includes the SSFs is essential. This length 
depends on the number of SSFs, which is equal to the number of SC endings, namdy it is twice as large as the 
number of SCs. For a representative library of DNA fragments of one million bp, 700 7-kb-4ong SCs are needed. 
This means that the average SSF size is 715 bp. The actual average number of SFs within such an SSF is not even 
one tenth of all SFs of the entire 7-kb SC The actual number is udependeut of the SC length, namely it depends 
only on the SSF length. According to Eq. 2, for a length of 715 bp and an Aa^of 30,000 that the anticipated ONPs 
have, the expected average number of SFs with an average length of 45 bp is 16. 

The arrangement of SFs within the SSFs obtained is accomplished via a 0.5-kb SC library. In this 
procedure, it is not essential that these be individual SCs; an SC pool can also be used. The SCs in a pool are 
informative if they do not overlap with each other. From an information and technology standpoint, a 10-kb pool 
of cloned DNA is advantageous, although it does not represent a limit. The required number of these SCs or pools 
is such that the maximum size of the SSF they form will not be greater than 300 bp. With the proposed ONP. we 
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anticipate within this DNA length 3 SFs (Eq. 2). whtch. is already expiaineu. can be arranged unamoicuousiv. Bv 
use or the binomial distribution, we derived the equation 

Nsa = 2Nsc il - INsc/Nbp) 1 ^ (3) 

wherein Nsa is the number of SSFs ureater than Lms. Nsc is the number of SCs. Nbp is the number or base pairs 
in the DNA fragment or molecule being sequenced. Lms is the SSF size that, on average, gives the arraneement 
number of 3 SF and which in this case is 300 bp. On the basis of this equation, it was determined that 25.000 
0.5-kb SCs are needed for a DNA fragment of one million bp. The number of 10-lcb poois is 1250. The averace 
SSF size obtained for this SC is 20 bp. 

SF arranging is done by computer-assisted detection of pools containing SCs that overlap the siarunc SSF 
obtained by arranging the basic library. The detection is performed on the basis of the content of the entire 
randomly selected SF or pan thereof in the starting SSF. Based on the content of the other SFs of the starting SSF. 
the size of the overlap of 0.5-kb SCs is determined, and at the same time, because of their high density, the order 
of SFs in the starting SSF is also determined. At the end of this process, one obtains the sequence of ~><*h group 
of arranged 7-kb SCs aad an inriinrioo as to the pool that contains the 0.5-kb SC that contains the r*«^™i- r 
sequence. At a certain tnadl number of locations, the sequence will not be complete or it will be ambiguous. Our 
calculations show that this happens on average at less than one location per million bp, the randomly distributed 
nndrtcrtcd ONSs amounting to 30%. These locations are sequencrH by suitable treatment of the SCs ^>*^g 
them and repeated application of SBH or by competitive hybridization of suitably selected pairs of unlabeled and 
labeled ONPs or by the conventional method or by the advanced conventional method. 

The competitive hybridization procedure will be explained on the example of a twice repeated 7 -bp 
sequence. In this case, two SFs terminate and two start with the repeating sequence TTAAAAGG. which is 
underlined: 

5 , NNNNNNNNNNNNCA TTAAAAGG3* 
5'NNNNNNNNNNNNCC TTAAAAGGT 

5 TTAAAAGCT ACNNNNNNNH • 
5 TTAAAAGG CCCNNNNNNNV 

By prehybridization with excess unlabeled ONP, for example 57N21CA TTAAAAG (Nn3* which because 
of a noncorapiementary base cannot hybridize to i'NNCG TTAAAAGG3*. the subsequent hybridization of one of 
the two labeled ONPs, i.e., 5 '(N2) AAAAGGTACf N I )3 " or 5'(N2)MM£fiCCG(Nl)3\ is prevented. The pair 



AFF001981 



or probes that compete wuh each other defines the Srs that continue one ar'ter the other. This can be confirmed 
by an alternative seiecuon or a suitable pair or ONPs. This procedure can be appiied to ail repeaunz ONSs with 
length up to 18 bp. To apply the procedure to the arranging or the large number or SFs. the prehybndizauon must 
be separated from the hybridization in space and ume. For this reason, the stability or" the uniabeled ONP is 
important. If stability cannot be achieved via ONP concentrations and hybridization temperatures, the cold ONPs 
will covaientiy bind to the complementary DNA thai is bound to the filter by exposure to UV radiation in the 
presence of psoralen or by use of ONPs bearing a reacuve group capable of covalent binding. 

The SCs or the third library are used to bind the sequenced parts into a single DNA fragment. About 170 
SCs are needed for one million bp. For larger DNA fragments, the values are di reedy proportional to this and other 
numbers calculated for one million bp. Because these SCs contain sequences that are separated by an average of 
100 kb. with them u is possible to skip repeated or uncioned sequences that are up to 100 kb long by tindins out 
which two sequenced pans contain sequences that are present in an SC of this library. 

The ex pen mental requirements of this method are represented by the total number of ONPs. the 
hybridization of 50.000 thereof and to the number of separate SCs out of 2120 that must be hybridized for a 
l-tmilion bp DNA fragment. 

The described libraries are created in phage vector M13. This vector facilitates the cloning of 100-7000 
bp-long inserts and gives a high titer of recombinant phages from bacterial ceils by budding without cell lysis. 
Centri raging the bacterial cultures gives a phage prepa ration that is not contaminated with bacterial DNA. and the 
bacterial sediment can be used for another phage production after adding the nutrient nr ^i "^ By addition of 
alkali. DNA separates from the protein envelope and at the same time undergoes denaxurauon. This results in 
effective spotting and covalent binding to nylon filters on which the hybridization is carried out. Hybridization of 
one SC with all ONPs requires an amount of DNA equal to that which can be obtained from a few milliliters of 
bacterial culture. Most advantageous for growth and robotic application to filters are plates that arc similar to 
microtitrauoQ plates and have appropriate dimensions and holes of appropriate volume. 

The DNA is applied to the filters wuh a robotic arm. An arm with 10.000 suction apemires is sufficient 
for sequencing even the Urges genomes. After aspiration of the DNA solution from the holes of the 'micro" plates, 
the suction elements are brought closer to each other by means of a reducing head until they are separated by a 
distance of 1 mm. Then, an appropriate amount of DNA and at the same lime 10.000 SCs are appiied to the filter. 
This is repeated on the required number of filters with the same 10,000 SCs. The same is then done with all the 
other SCs in groups of 10.000. The number of "prints' of a group of 10,000 SCs for 50.000 ONPs is about 1000, 
because each filter can be washed and reused 50 times. 
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Hybridization is earned out in cycies. One cycle requires one day at the most. During one cycle, all SCs 
arc hybridized with the defined number or ONPs. To complete ail hybridizations within a reasonable length ot" time, 
about 1000 containers wuh one ONP each are used per cycle. To save on ONPs. a smaller volume ot* hybridizauon 
liquid is used, and the tillers are added in severai steps. The filters from nil hybridization containers are collected 
in one container where all of them are subjected to further treatment ax the same time, namely to washing and to 
(he performance of color reactions when biotinylized rather than radioactive! y labeled ONPs are used. By carrying 
out the hybridization in 20 x 20 x 20 cm containers and without repeating the individual cycle, the SCs required 
for sequencing DNAs as large as 1 0*° bp can be hybridized. 

After each hybridization cycle, the technological procedure continues by reading the hybridization results. 
The data are stored in the computer memory of the computer center. The data are of a binary character ( + ,-) and 
their reading involves several sensitivity thresholds. From these data, at the computer center, the SFs are arranged 
first followed by the mutual arrangement of the SFs and SCs. After all data have been processed, the computer 
center determines which SC must be subjected to what additional treatment to obtain the complete sequence. 

SBH is a method that minimizes experimental work at the expense of more computer work. The only 
techoiogicai requirement is sequencing by specific ONP hybridization. The non-use of up to 6% of the envisaged 
ONPs can be tolerated without the appearance of gaps in the reading of the DNA scrnimcc. To reduce the number 
of false negatives (unsuccessful hybridization of ONPs because of their limited hybridization length of 11 
nucleotides) and to rlirrmwcr false positives, the envisaged ONPs have nonspcrified bases at the ends, namely at 
the only locations where errors are possible. In place of ONPs of group (N3)N8, ONPs of group (N2)N8(N1) are 
used. For this reason, even the ONPs meant for measuring the length of monotonic sequences are synthesized as 
(Nl)Nx(Ni) ONP groups. In the case of certain basic ONP groups which give many false negatives. ONP groups 
of the (N2)N8(N2) type are used, and the hybridization is carried out at the temperature used for 11-menc ONPs. 

The formation of internal duplexes in the ONA that is bound to the filter is one of the known structural 
reasons for false negatives or gaps in sequence reading. This problem is overcome by improving the bonding of 
DNA to nylon filters and by cutting the DNA into fragments of an average size of 50 bp (ultrasound, acid, 
endonuclease) before applying it to the filter. A significant number of these fragments of the recombinant molecule 
will also be cut within the duplex structure. The formation of such a structure is thus prevented nuking 
hybridization possible. 

The approach on which this procedure is based makes it possible for a computer-controlled, fully robotized 
line to produce sufficient data in the form of binary signals from which the sequence of complex DNA fragments 
or molecules can then be obtained by computing. 
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PATENT CLAIMS 

1 . Procedure tor sequencing ihe enure genomic DN A or large parts thereof by hybridizaxion with 

oligonucleotide probes, characterized in that the replicated fragments of genomic DNA are hybridized with some 
or all 8- to 20-nucleotides-iong ONPs resulting from the variation and repetition of the 4 nucleotides A. Tor U, 
C. G. or their derivatives and analogs, by using individual ONPs or a mixture of individually synthesized ONPs. 
or of an arranged ONP group synthesized so that more or all nucleotides or their derivatives or analogs are «HH~j 
at a certain point during syndesis, and that the hybridization reaction is earned out under conditions in which the 
oligonucleotide probes hybridize only with a rully homologous sequence or a sequence that has an amount of 
nonhomology that does not cause the formation of ambiguous or faulty sequences in the process of arranging 
positively hybridizing ONPs via a maximum mutual overlap of their sequences. 

2. Procedure according to Claim 1. characterized in that repli c a te d fra gm ents of genomic ONA are 
obtained by cloning into vectors based on single-stranded bacteriophages in the form of three subclone libraries with 
inserts of 0.1 to 1 kb and 3 to 10 kb and inserts consisting of two parts separated from the genomic DNA by aa 
average of 50 to 200 kb, that they are replicated as individual subclones and as SC groups obtained by simultaneous 
infection and that they are hybridized on the filter to which they are applied as a hybridization spot as uninterrupted 
or cut out vector-insert ONAs of individual subclones and groups of subclones up to fragments of an average length 
of 20 bp. 

3. Procedure according to Claims 1 and 2* characterized in that the subfragments of the ttb t? of 
the individual subclones or groups of subclones, obtained by overlapping positively hybridizing ONPs for the given 
subclone or group of subclones, are arranged into a natural linear array by cyclic defection of overlapping subclones 
based on the content of subfragments of the sequence of the starting subclone or group of subclones, which 
subclones in a library of 0.1 to 1 kb show an average displacement of less than 100 kb. 

4. Procedure according to Claims 1 and 2. characterized in that the subfragments of the sequence of 
the individual subclone or group of subclones, obtained by overlapping positively hybridizing ONPs for the given 
subclone or group of subclones, are arranged into a natural linear array by the procedure of competitive 
hybridization with unlabeled and labeled oligonucleotide probes whereby first the filter hybridizes with a sanitating 
amount of unlabeled oligonucleotide probe, which contains all or part of the terminal, repeating oligonucleotide 
sequences in the sequence subfragment for which it is desired to determine the following sequence subfragment and 
then, with or without previous covalent bonding of this cold probe to the filter, separate hybridizations are carried 
out with labeled oligonucleotides with probes containing all or part f the repeating ligonucieotide sequence, so 
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that at least a part is common to that pan of the repeating sequence which is contained in the unlabeled probe, and 
the remainder of the nonrepeating sequences that follow the repealing sequence, from each sequence subfraetacat 
that contains the repealing sequence, and the following sequence subfragmem is determined as that whose labeled 
oligonucleotide probe does not hybridize. 

5- Procedure according to Claims i -4. characterized in that the sequencing of a million bp mammalian 

DNA is carried out with individual hybridization spots containing 1250 groups of 20, or an average of 20. groups 
of M 13 subclones 0.5 kb in length, 700 M 13 subclones 7 kb in length and 170 skipping M 13 subclones which skip 
on average 100 kb of the genomic DNA, and by hybridization of each spot with 1024 groups of 16 probes -^h of 
the (A.T,C,G)N10(A,T,C,G) type wherein N10 are all the 10-mers thai do not contain the C and C nucleotides, 
with 23,040 croups of 16 probes each of the < A,T.CG)N9f A.T.CG) type where N9 are ail the 9-mers containing 
one or two C +G nucleotides, with 55,834 groups of 64 probes each of the < A.T,CGXA.T.C.G)N8( A.T,C,G) type 
or of the <A.T,CGXA.T,C.G)N8fA,T.C,CXA.T.GG) type where N8 are ail the 8-mera containing three or mors 
C +G nucleotides, and with 3725 groups of 16 probes each of the (A,T.C.G)Nm(A,T,C.G) type where Nm denotes 
all monotonic sequences of the required lengths shorter than 18 bp and consisting of 1 to 7 nucieotides-4oog 
repeating units. 



A p pl icant 
(Signed:) 

Prof. Dr. VUdiaii Glisuu 
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ABSTRACT 



The conditions under which oligonucleotide probes hybridize only with fully homologous sequences Are 
known. By such hybridization and by arranging positively hybridizing probes via overlapping parts, the sequence 
of the given ONA fragment is read. By simultaneous hybridization of DN A molecules of the single-stranded phage 
vector-cloned insert, applied in the form of spots, with about 50.000 to 100.000 groups of probes the main type of 
which is (A,T.C,G)(A,T.C.G)N8(A,T,C,C) f information for computer-assisted determination of DNA sequences 
of the complexity of the mammalian genome can be obtained. To obuin as complete sequen c es as possible, three 
libraries in the vector based on the M 13 phage are used: those with 0.5 kb inserts, those with 7 kb inserts and those 
with inserts consisting of two sequences separated in the genomic DNA by an average of 100 kb. For one million 
bp of genomic ONA are needed 25.000 0.5-kb subclones. 700 7-kb subclones and 170 skipping subclones. The 
0.5 kb subclones are applied to the filler in groups of 20. so that the total number of samples is 2120 per million 
bp. The procedure can be readily and completely robotized for reading complex genomic DNA fragments or 
molecules in a manufacturing plant. 
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